Skip to content

fix(#1433): clear error for leading-underscore attribute names#1441

Merged
dimitri-yatsenko merged 2 commits intomasterfrom
fix/1433-allow-leading-underscore-in-attribute-names
Apr 29, 2026
Merged

fix(#1433): clear error for leading-underscore attribute names#1441
dimitri-yatsenko merged 2 commits intomasterfrom
fix/1433-allow-leading-underscore-in-attribute-names

Conversation

@dimitri-yatsenko
Copy link
Copy Markdown
Member

@dimitri-yatsenko dimitri-yatsenko commented Apr 29, 2026

Summary

Fixes #1433 by replacing the cryptic `pyparsing.ParseException` that users hit when declaring `_hidden: bool` with a clear `DataJointError` that explains the constraint and points to the alternative.

Design decision: don't allow user-defined hidden attributes

Earlier drafts of this PR loosened the parser to accept leading underscores, on the assumption that users should be able to declare hidden attributes the same way the framework does. On reflection, that's wrong:

  • Hidden attributes are filtered out of every public API surface — `fetch()`, dict restrictions, joins, `insert`/`update1`, `describe()`. The filter is intentional and reflects the platform-bookkeeping intent.
  • Platform-managed hidden columns (`_job_start_time`, `_job_duration`, `_job_version`, `_singleton`) are injected programmatically after parsing and populated via raw SQL during the `populate()` lifecycle (autopopulate.py). The parser never sees them.
  • Allowing users to declare hidden attributes would expose a feature with no public-API write path, no `describe()` round-trip, and silent filtering on dict restrictions and `insert(ignore_extra_fields=True)`. That's a footgun.
  • The cases users reach for hidden attributes — most commonly an index-backing derived column like `params_hash` — are better solved with a regular attribute. Backing a unique index isn't sufficient reason to hide a column. If application code computes, inserts, or queries the column, it should be a regular attribute.

So this PR keeps the parser strict and improves the error message instead.

Fix

Pre-flight check in `compile_attribute` (declare.py:858) that detects a leading underscore and raises a `DataJointError` before the parser is invoked:

```
Attribute name in line "_hidden: bool" starts with an underscore.
Names with leading underscore are reserved for platform-managed
columns (e.g. _job_start_time, _singleton). Use a regular attribute
name; if you need to control visibility at the call site, use proj().
```

The parser regex is unchanged from master. Platform code is unaffected because `job*` and `_singleton` bypass `compile_attribute` entirely.

Companion docs PR

datajoint/datajoint-docs#162 — reworked to reflect the new design. The §3.4 section is now framed as platform-only, the `_params_hash` user-defined example is removed, and the table of behaviors is preserved as a reference for the platform-managed columns.

Test plan

  • `tests/unit/test_declare_hidden_attribute.py` — 6 tests: `compile_attribute` rejects `_hidden`, `params_hash`, and leading-whitespace variants with the helpful message; parser still rejects leading ``; parser still accepts plain names; parser still rejects digit-start.
  • Full unit test suite (255 tests) — all pass, no regressions.
  • `ruff check` and `ruff format` clean.

The attribute_name parser in declare.py was pp.Word over [a-z] init chars
and [a-z0-9_] body chars, rejecting any name starting with `_`. But the
framework already treats names starting with `_` as hidden attributes
(Heading.attributes filters by is_hidden = name.startswith("_")), and
internal hidden columns like _job_start_time, _job_duration, _job_version,
and _singleton are injected programmatically, bypassing the parser.

User-defined hidden attributes — documented at
docs.datajoint.com/reference/specs/table-declaration/#34-hidden-attributes —
hit the parser and failed with a cryptic pyparsing ParseException.

Allow `_` in the init-chars set so user code like

    _params_hash: varchar(32)
    unique index (tool, _params_hash)

declares cleanly. Names starting with a digit are still rejected.

Fixes #1433
User-defined hidden attributes (names starting with `_`) are intentionally
not supported. The framework filters hidden columns out of every public
API surface — fetch, dict restriction, insert, update1, describe — and
populates platform-managed hidden columns (`_job_*`, `_singleton`) via
raw SQL during the populate() lifecycle, not via the user-facing methods.
Allowing users to declare hidden columns produces a feature with no
public-API write path, no describe() round-trip, and silent dict-
restriction filtering. The right fix for cases users reach for hidden
attributes (e.g. an index-backing hash like `params_hash`) is a regular
attribute.

Add a pre-flight check in compile_attribute that detects a leading
underscore and raises DataJointError with a clear message pointing to
the alternative, instead of leaking pyparsing internals:

    Attribute name in line "_hidden: bool" starts with an underscore.
    Names with leading underscore are reserved for platform-managed
    columns (e.g. _job_start_time, _singleton). Use a regular attribute
    name; if you need to control visibility at the call site, use proj().

Platform code is unaffected: `_job_*` and `_singleton` are injected
programmatically *after* parsing, so they bypass compile_attribute.

Replaces 7 unit tests asserting "parser accepts _" with 4 asserting
"compile_attribute rejects _ with helpful message" and "parser remains
strict".

Fixes #1433
@dimitri-yatsenko dimitri-yatsenko changed the title fix(#1433): allow leading underscore in attribute names fix(#1433): clear error for leading-underscore attribute names Apr 29, 2026
dimitri-yatsenko added a commit to datajoint/datajoint-docs that referenced this pull request Apr 29, 2026
Updated to reflect the design decision in datajoint/datajoint-python#1441:
the parser keeps rejecting leading-underscore attribute names and now
returns a clear DataJointError instead of a cryptic ParseException.

Reframe §3.4 around the platform-managed-only intent:

- Lead paragraph states up-front that user-defined hidden attributes are
  not supported, and shows the new error message users will see.
- Drop the "User-defined hidden attributes" subsection and the
  _params_hash hidden example.
- Keep the platform-attributes table and the behavior matrix — both are
  still useful for users encountering platform-managed hidden columns
  (_job_start_time, etc.) in fetch results, joins, and describe output.
- Add an explanation paragraph ("Why users can't declare them") covering
  the no-write-path / no-round-trip / silent-filter rationale.
- Replace the user-defined example with a regular-attribute example
  (params_hash backing a unique index), demonstrating the recommended
  pattern: declare as a regular attribute, use proj() at the call site
  for visibility control.
@dimitri-yatsenko dimitri-yatsenko merged commit db42c26 into master Apr 29, 2026
7 checks passed
@dimitri-yatsenko dimitri-yatsenko deleted the fix/1433-allow-leading-underscore-in-attribute-names branch April 29, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Hidden attributes not permitted by declaration regex

2 participants